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A  RESTRICTED  ADDITIVE  SCHWARZ  PRECONDITIONER  FOR  GENERAL  SPARSE 

LINEAR  SYSTEMS 


XIAO-CHUAN  CAP  AND  MARCUS  SARKIS1" 

Abstract.  We  introduce  some  cheaper  and  faster  variants  of  the  classical  additive  Schwarz  precondi¬ 
tioner  (AS)  for  general  sparse  linear  systems  and  show,  by  numerical  examples,  that  the  new  methods  are 
superior  to  AS  in  terms  of  both  iteration  counts  and  CPU  time,  as  well  as  the  communication  cost  when 
implemented  on  distributed  memory  computers.  This  is  especially  true  for  harder  problems  such  as  indefinite 
complex  linear  systems  and  systems  of  convection-diffusion  equations  from  three-dimensional  compressible 
flows.  Both  sequential  and  parallel  results  are  reported. 

Key  words.  Overlapping  domain  decomposition,  preconditioner,  iterative  method,  sparse  matrix 

AMS(MOS)  subject  classifications.  65N30,  65F10 

1.  Introduction.  In  this  paper,  we  introduce  some  modified  overlapping  additive  Schwarz  precondi¬ 
tioners  for  sparse  linear  systems.  The  original  additive  Schwarz  method  (AS)  was  introduced  for  solving 
symmetric  positive  definite  elliptic  finite  element  problems,  and  was  later- extended  to  many  other  nonsym- 
metric  and  non-elliptic  systems;  see,  e.g,,  [7,  8,  16].  AS  is  available  in  several  large  parallel  software  libraries, 
such  as  PETSc  [1]  and  P-SPARSLIB  [14].  We  here  propose  a  very  simple  change,  and  the  resulting  algorithm 
is  more  effective  in  terms  of  both  iteration  numbers  and  CPU  time  on  sequential  and  parallel  computers. 
We  have  tested  the  method,  referred  to  as  the  restricted  additive  Schwarz  method  (RAS),  for  a  wide  range 
of  problems  including  convection- diffusion  equations,  indefinite  complex  Helmholtz  equations  and  the  3D 
compressible  Euler’s  equation  discretized  on  unstructured  meshes.  We  shall  present  the  new  methods  as 
algebraic  preconditioners  for  general  sparse  linear  systems. 

RAS  was  found  accidentally.  While  working  on  a  AS/GMRES  algorithm  in  a  Euler  simulation,  we 
removed  part  of  the  communication  routine  and  surprisingly  the  “then  AS”  method  converged  faster  both 
in  terms  of  iteration  counts  and  CPU  time.  We  note  that  RAS  is  the  default  parallel  preconditioner  for 
nonsymmetric  sparse  linear  systems  in  PETSc  [1],  and  has  been  used  in  several  applications  ([11,  13]). 

The  paper  is  organized  as  follows.  We  devote  §2  to  the  description  of  RAS  and  other  variants.  Several 
case  studies  are  given  in  §3.  We  conclude  the  paper  by  a  few  remarks  in  §4. 

2.  A  restricted  additive  Schwarz  preconditioner.  We  consider  a  linear  system, 

(1)  Ax  —  b 

where  A  —  ( )  is  an  n  x  n  nonsingular  sparse  matrix  having  a  non-zero  pattern  that  is  symmetric.  To 
describe  the  algebraic  Schwarz  algorithm,  as  in  [6],  we  define  a  graph  G  =  ( W,E ),  where  the  set  of  vertices 
W  =  {1,  •  ■  •  ,n}  represents  the  n  unknowns  and  the  edge  set  E  —  \  chj  ^  0}  represents  the  pairs  of 

vertices  that  are  coupled  by  a  nonzero  element  in  A.  Since  we  assume  that  the  non-zero  pattern  is  symmetric, 
the  adjacency  graph  G  is  undirected.  For  multi-component  problems,  a  vertex  in  G  often  represents  several 
unknowns  in  (1)  that  are  associated  with  a  single  mesh  point.  We  confine  our  discussion  to  the  single 
component  case  and  the  extension  to  other  cases  is  straightforward.  For  the  remaining  discussion,  we  will 
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F49620-97- 1-0059,  and  in  part  by  the  National  Aeronautics  and  Space  Administration  under  NASA  contract 
NAS  1-1 9480  while  the  author  was  in  residence  at  the  Institute  for  Computer  Applications  in  Science  and 
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assume  that  the  graph  partitioning  has  been  applied  and  has  resulted  in  N  nonoverlapping  subsets  W®  whose 
union  is  W.  We  define  the  overlapping  partition  of  W  as  follows.  Let  {W}}  be  the  one-overlap  partition  of 
W,  where  W}  D  W®  is  obtained  by  including  all  the  immediate  neighboring  vertices  of  the  vertices  in  W®. 
Using  the  idea  recursively,  we  can  define  a  ^-overlap  partition  of  W, 

N 

w  =  \Jwf, 

i=  1 

where  Wf  D  Wf  with  S  levels  of  overlaps  with  its  neighboring  subdomains.  Here  5  >  0  is  an  integer. 
Associated  with  each  W®  we  define  a  restriction  operator  R In  matrix  terms,  R®  is  an  n  x  n  sub-identity 
matrix  whose  diagonal  elements  are  set  to  one  if  the  corresponding  node  belongs  to  W®  and  to  zero  otherwise. 
Similarly  we  can  define  Rf  for  each  W* .  With  this  we  define  the  matrix, 

Ai  =  RSiAR l  . 

Note  that  although  Ai  is  not  invertible,  we  can  invert  its  restriction  to  the  subspace 

Ai  1  =  ((Ai)|L.)  , 

where  .Li  is  the  vector  space  spanned  by  the  set  W?  in  Rn.  Recall  that  the  regular  AS  preconditioner  is 
defined  as 


Mil  =  ^  Rf. 

Our  new  RAS  algorithm  can  be  simply  described  as  follows 

Algorithm  1  (Restricted  Additive  Schwarz).  Solve  the  equation 

M ras  Ax  =  MR\sb 

by  a  Krylov  subspace  method,  where  the  preconditioner  MR\S  is  defined  by 

Mras  ~  RiA\  R\  +  *  *  *  +  R^A^R^  . 

Remark  2.1.  Unless  (5  =  0,  MR\S  is  nonsymmetric  even  for  symmetric  A.  For  certain  symmetric 
matrices  A,  for  example  the  five-point  Poisson  matrix,  RAS/GMRES  is  more  effective  than  AS/CG  in  terms 
of  iteration  numbers,  CPU  and  communication  times,  although  more  memory  is  needed  to  save  the  Krylov 
vectors  in  GMRES.  If  S  =  0,  MR\S  reduces  to  the  usual  block  diagonal  preconditioner. 

Remark  2.2.  A  multiplicative  version  of  RAS  can  be  derived  easily  as  in  [6]. 

REMARK  2.3.  In  a  parallel  implementation,  half  of  the  communication  cost  can  be  saved  because  R^x 
does  not  involve  any  data  exchange  with  the  neighboring  processors.  This  is  the  main  motivation  for  us  to 
design  this  algorithm  in  replacing  the  AS  method. 

Remark  2.4.  There  is  another  version  of  the  preconditioner,  referring  to  as  the  additive  Schwarz  with 
harmonic  extension  (ASH),  defined  as  follows 

Mash  =  1Ri  -f - f*  R5NANXR%  . 

It  turns  out  RAS  and  ASH  have  similar  behavior  for  all  the  numerical  tests  we  report  in  the  next  section. 
Remark  2.5.  Both  RAS  and  ASH  can  be  symmetrized  as 

Mrash  ~  +  •  •  •  4-  R%ANlR%  . 
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Table  1 

Iteration  counts  for  the  Poisson  and  the  convection-diffusion  problems  on  a  128  x  128  mesh . 


subd  =  16,  coarse  4x4 

subd  =  64,  coarse  8x8 

bi  = 

0 

II 

*0 

61  = 

10,  62  =  20 

61  = 

62-0  ; 

bi  = 

10,62  =  20 

(5 

AS 

RAS 

AS 

RAS 

AS 

RAS 

AS 

RAS 

1 

20 

17 

23 

20 

20 

18 

22 

20 

2 

18 

14 

20 

17 

18 

15 

19 

17 

3 

16 

13 

19 

15 

16 

13 

18 

15 

Our  numerical  tests  show  that  M^\SH  is  always  weaker  than  M^\s. 

REMARK  2.6.  In  practice ,  whenever  possible ,  a  coarse  preconditioner  is  often  incorporated  either  ad- 
ditively  or  multiplicatively  to  the  Schwarz  type  preconditioners  to  make  the  convergence  rate  independent  of 
the  number  of  subspaces.  Details  can  be  found  in  [7,  16}. 

Remark  2.7.  Some  improvement  can  be  obtained  if  the  restriction  operator  is  defined  by  using  certain 
weights  such  that  the  sum  of  the  operators  is  equal  to  the  identity  matrix.  For  example,  we  can  define 
Rf  u  as  an  n  x  n  diagonal  matrix  whose  diagonal  element  is  set  to  zero  if  the  corresponding  node  does  not 
belong  to  Wf ,  and  to  1/k  if  the  node  belongs  to  Wf  and  k  —  1  other  subdomains.  This  weighted,  restriction 
operator  can  also  be  used  to  improve  the  classical  AS.  For  example,  a  weighted  AS  can  be  defined  as  = 

Ri^ApRi  +  •  ■  •  +  Rn,ua^r5n  . 

3.  Case  studies.  We  provide  some  numerical  examples  that  compare  RAS  with  the  regular  AS 
method.  In  all  the  test  runs,  we  always  use  GMRES  ([15])  as  the  accelerator  even  if  the  matrix  is  symmetric 
positive  definite.  We  stop  the  iteration  when  the  Euclidean  norm  of  the  preconditioned  initial  residual  is 
reduced  by  a  factor  of  10~6.  GMRES  restarts  at  30  for  all  2D  test  problems  and  5  for  the  3D  test  problem. 
subd  denotes  the  number  of  subdomains.  A  non-nested  coarse  space  ([2])  is  included  in  all  the  2D  tests. 

The  first  test  is  for  a  two-dimensional  convection-diffusion  problem  Lu  —  —A u  -f  biux  +  b2Uy  with 
zero  Dirichlet  boundary  condition  on  the  unit  square.  The  equation  is  discretized  with  the  usual  five-point 
finite  difference  scheme  with  a  first  order  upwinding  if  6162  7^  0.  An  128  x  128  fine  mesh  is  partitioned  into 
16  and  64  subdomains.  GMRES  restarts  at  30.  The  iteration  numbers  can  be  found  in  Table  1  for  the 
Poisson  case  when  61  =  62  =  0  and  for  a  more  general  case  with  61  =  10, 62  =  20.  RAS  indeed  takes  fewer 
number  of  iterations.  For  the  Poisson  problem,  we  plot  all  the  eigenvalues,  computed  with  MATLAB,  of  the 
preconditioned  matrix  in  Fig.  1.  We  note  that  some  of  the  eigenvalues  of  the  RAS  preconditioned  matrix 
are  complex,  and  if  we  only  consider  the  real  part,  the  largest  eigenvalue  of  the  RAS  preconditioned  matrix 
is  much  smaller  than  the  largest  eigenvalue  of  the  AS  preconditioned  matrix.  The  difference  between  the 
smallest  eigenvalues  is  tiny.  This  is  probably  why  RAS  is  faster  than  AS,  although  eigen  information  may 
not  completely  reflect  the  convergence  of  GMRES. 

Our  second  test  is  for  a  two-dimensional  Helmholtz  equation  with  Sommerfeld  boundary  condition 
defined  on  the  unit  square,  i.e., 

in  £1 

on  dTt, 

where  i  =  and  k  >  0  is  a  real  constant,  d/dn  is  the  outward  normal  derivative.  /  is  chosen  so  that  the 
exact  solution  u  —  cos (k(x  -j-  y))  T  isin(k(x  +  y)).  We  use  a  standard  h  finite  element  discretization  [3,  12]. 


(2) 


f  —A  u  —  k2u  =  f 
du 


^  dn 


—  iku  =  0 
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Fig.  1.  The  distribution  of  eigenvalues  of  the  left-preconditioned  matrix.  The  left  figure  is  for  AS  and 
the  right  one  is  for  RAS.  The  fine  mesh  is  64  x  64,  which  is  partitioned  into  16  subdomains.  6  =  2.  No 
coarse  grid. 

Table  2 

The  Helmholtz  problem  on  a  128  x  128  mesh  with  Sommerfeld  boundary  condition. 


subd  =  16 

subd  =  64 

AS 

RAS 

AS 

RAS 

1 

88 

41 

60 

27 

2 

103 

32 

71 

26 

3 

109 

33 

77 

27 

The  linear  system  is  complex  symmetric,  but  not  Hermitian.  A  complex  version  of  GMRES  is  used.  The 
fine  mesh  is  128  X  128  and  coarse  mesh  is  20  x  20.  16  and  64  subdomains  are  used  here.  The  frequency 
parameter  in  the  Helmholtz  operator  is  k  =  10.0.  Table  2  has  the  iteration  numbers.  It  is  surprising  that, 
mostly  for  AS,  the  number  of  iterations  increases  as  6  becomes  larger  and  also  surprising  that  RAS  takes  so 
many  fewer  iterations  than  AS. 

Our  third  test  is  for  the  steady-state  transonic  (Moo  =  0.89)  three-dimensional  compressible  Euler’s 
equation  discretized  on  a  fully  unstructured  mesh  ([5]).  The  system  is  discretized  with  a  second  order  finite 
volume  method  [10].  An  inexact  Newton’s  method  is  used  to  solve  the  nonlinear  system.  We  report  the  test 
results  for  solving  a  linear  system  whose  solution  provides  the  inexact  Newton  direction.  The  unstructured 
mesh  has  22012  nodes  and  118480  tetrahedra,  and  is  partitioned  by  using  a  graph  partitioning  scheme 
provided  in  the  TOP/DOMDEC  package  [9].  The  number  of  unknowns  of  the  linear  system  is  110060. 
Minimum  overlap  is  used,  i.e.,  6  =  1  and  CFL=400.0.  No  useful  coarse  space  is  available  for  this  test 
problem.  The  10-6  stopping  condition  may  appear  excessive  for  the  simulation,  but  it  is  nevertheless  useful 
to  demonstrate  the  efficiency  of  the  method.  We  note  however  that  a  similar  behavior  is  obtained  for  using 
looser  convergence  tolerance  and  for  unsteady  Euler  simulations.  The  CPU  and  communication  times,  for 
solving  the  whole  problem,  are  obtained  on  a  IBM  SP  machine.  Parallel  performance  for  up  to  32  processors 
is  reported  in  Table  3.  Other  interesting  numerical  results  for  using  RAS  in  parallel  3D  steady-state  Euler 
calculations  with  up  to  128  processors  have  also  been  reported  in  [11,  13]. 

4,  Concluding  remarks.  We  have  introduced  a  cheaper  and  faster  variant  of  the  classical  additive 
Schwarz  method  and  tested  it  for  a  few  challenging  problems  including  indefinite  Helmholtz  equations  in 
2D  and  the  compressible  Euler  equation  on  3D  unstructured  meshes.  All  tests  show  that  the  new  method  is 
superior  to  the  additive  Schwarz  method  in  terms  of  iteration  counts,  CPU  time  and  communication  costs 
if  implemented  in  parallel.  Some  theoretical  works  on  the  new  method  can  be  found  in  [4]. 
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Table  3 

Transonic  Euler’s  equation  on  a  3D  unstructured  mesh.  CPU  and  COMM  are  in  seconds .  The  number 
of  processors  is  the  same  as  the  number  of  subdomains  subd.  5  =  1  for  all  tests. 


AS 

RAS 

subd 

Iter 

CPU 

COMM 

Iter 

CPU 

COMM 

4 

131 

24.05 

1.10 

100 

19.31 

0.53 

8 

140 

14.80 

1.26 

105 

11.50 

0.69 

16 

145 

9.00 

1.44 

106 

6.78 

0.90 

32 

152 

7.67 

2.45 

110 

5.77 

1.17 

Acknowledgement:  We  thank  M.  Dryja,  C.  Farhat,  D.  Keyes,  B.  Smith,  and  O.  Widlund  for  many 
stimulating  discussions.  We  also  thank  the  referees  for  their  suggestions. 


REFERENCES 

[1]  S.  Balay,  W.  Gropp,  L.  McInnes  and  B.  Smith,  The  Portable,  Extensible  Toolkit  for  Scientific 

Computing,  version  2.0.13 ,  http://www.mcs.anl.gov/petsc,  code  and  documentation,  1996. 

[2]  X.-C.  Cai,  The  use  of  pointwise  interpolation  in  domain  decomposition  methods  with  non-nested  meshes , 

SIAM  J.  Sci.  Comput.,  16  (1995),  pp.  250-256. 

[3]  X.-C.  Cai,  M.  Casarin,  Jr.,  F.  Elliott,  Jr.,  O.  Widlund,  Overlapping  Schwarz  algorithms  for 

solving  Helmholtz’s  equation ,  The  Tenth  International  Conference  on  Domain  Decomposition  Meth¬ 
ods  for  Partial  Differential  Equations,  J.  Mandel,  C.  Farhat  and  X.-C.  Cai,  eds,  AMS,  1998.  (to 
appear) 

[4]  X.-C.  Cai,  M.  Dryja  and  M.  Sarkis,  A  convergence  theory  for  restricted  additive  Schwarz  methods , 

Tech.  Report,  Dept,  of  Computer  Science,  Univ.  of  Colorado  at  Boulder,  1998.  (to  appear) 

[5]  X.-C.  Cai,  C.  Farhat  and  M.  Sarkis,  A  minimum  overlap  restricted  additive  Schwarz  preconditioner 

and  applications  in  3D  flow  simulations ,  The  Tenth  International  Conference  on  Domain  Decompo¬ 
sition  Methods  for  Partial  Differential  Equations,  J.  Mandel,  C.  Farhat  and  X.-C.  Cai,  eds,  AMS, 
1998.  (to  appear) 

[6]  X.-C.  Cai  and  Y.  Saad,  Overlapping  domain  decomposition  algorithms  for  general  sparse  matrices , 

Numer.  Lin.  Alg.  Applic.,  3  (1996),  pp.  221-237. 

[7]  T.  Chan  and  T.  Mathew,  Domain  decomposition  algorithms,  Acta  Numerica  (1994),  pp.  61-143. 

[8]  M.  Dryja  and  O.  B.  Widlund,  Domain  decomposition  algorithms  with  small  overlap ,  SIAM  J.  Sci. 

Comput.,  15  (1994),  pp.  604-620. 

[9]  C.  Farhat,  S.  Lanteri  and  H.  Simon,  TOP /DOMDEC:  A  software  tool  for  mesh  partitioning  and 

parallel  processing  and  applications  to  CSM  and  CFD  computations,  Comput.  Sys.  Engrg.,  6  (1995), 
pp.  13-26. 

[10]  L.  FEZOUI  AND  B.  STOUFFLET,  A  class  of  implicit  upwind  schemes  for  Euler  simulations  with  unstruc¬ 

tured  meshes,  J.  Comp.  Phys.,  84  (1989),  pp.  174-206. 

[11]  W.  Gropp,  D.  Keyes,  L.  McInnes  and  M.  Tidriri,  Parallel  implicit  PDE  computations :  Algorithms 

and  software,  Proceedings  of  Parallel  CFD’97,  A.  Ecer,  et  al.,  edt.,  Manchester,  UK,  1997. 

[12]  F.  Ihlenburg  AND  I.  Babuska,  Finite  element  solution  of  the  Helmholtz  equation  with  high  wave 

number,  Part  I:  The  h-version  of  the  FEM,  Computers  Math.  Applic.,  30  (1995),  pp.  9-37. 

[13]  D.  Keyes,  D.  Kaushik  and  B.  Smith,  Prospects  for  CFD  on  petaflops  systems ,  CFD  Review  1997, 

M.  Hafez  et  al.,  edt.,  1997. 

[14]  Y.  Saad  and  A.  Malevsky,  P-SPARSLIB:  A  portable  library  of  distributed  memory  sparse  itera¬ 

tive  solvers,  (version  2.15,  May  1997),  Technical  Report  UMSI  95-180,  Minnesota  Supercomputer 
Institute,  1995. 

[15]  Y.  Saad  AND  M.  Schultz,  GMRES:  A  generalized  minimum  residual  algorithm  for  solving  nonsym- 

metric  linear  systems,  SIAM  J.  Sci.  Stat.  Comput.,  7  (1986),  pp.  856-869. 

[16]  B.  Smith,  P.  Bjorstad,  AND  W.  Gropp,  Domain  Decomposition:  Parallel  Multilevel  Methods  for 

Elliptic  Partial  Differential  Equations ,  Cambridge  University  Press,  1996. 


5 


